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SUMMARY 


*The  Air  Force  Officer  Qualifying  Test  (AFOQT)  is  a  paper-and-pencil  aptitude  battery.  Test 
results  are  used  to  make  selection  decisions  based  on  Verbal  (V)  and  Quantitative  (Q)  composite 
scores  and  classification  decisions  based  on  Pilot  (P)  and  Navigator-Technical  (N-T)  composite 
scores.  Retests  are  not  permitted  until  after  6  months,  unless  the  applicant  can  show  the  first 

testing  did  not  reflect  his/her  true  ability.  A  relatively  large  number  of  waivers  of  the 

6-month  requirement  are  granted.  This  study  addressed  the  benefits  of  retesting  by  comparing 
retesters  with  non-retesters  and  by  determining  the  effects  of  retaking  the  AFOQT  over  various 
time  Intervals. 

Subjects  were  applicants  for  officer  training  who  tested  on  Form  0  of  the  AFOQT  between 
October  1981  and  December  1983.  This  Included  2,246  retesters  and  42,776  non-retesters.  The 

retesters  were  divided  Into  four  groups  who  were  retested  (a)  In  less  than  6  months,  (b)  from  6 
to  11  months,  (c)  from  12  to  17  months,  and  (d)  after  18  months.  T-test  results  Indicated  that 
retesters'  initial  scores  were  significantly  lower  than  those  of  non-retesters  and  that  they 
differed  significantly  among  groups  defined  on  the  basis  of  time  Interval  between  retest. 
Regression  analyses  were  performed  to  determine  whether  the  four  retest  groups  showed  differing 
score  gains.  Retest  scores  were  higher  than  initial  test  scores  for  all  groups  on  all 
composites.  The  groups  differed  In  amount  of  gain  In  P  and  N-T  but  not  on  the  V  and  Q 

composites.  The  1ess-than-6-months  group  showed  the  largest  gain,  followed  by  the  6-to-ll -months 
group.  The  1 2-to-l 7-months  group  showed  the  least  gain. 

It  was  concluded  that  candidates  who  obtain  a  waiver  benefit  most  by  retesting,  especially 
those  applying  for  pilot  and  navigator  training.  Whether  these  findings  stem  from  the 
candidates'  having  a  valid  reason  for  a  waiver  or  from  learning  effects  Is  not  clear.  Further 
research  Is  needed  to  clarify  time  and  composite  effects  associated  with  AFOQT  retesting. 
However,  practice  effects  would  be  minimized  by  allowing  retests  only  after  12  months. 
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PREFACE 


This  work  was  completed  under  Task  771918,  Selection  and  Classification 
Technologies,  which  Is  part  of  a  larger  effort  In  Force  Acquisition  and  Distribution. 
It  was  subsumed  under  work  unit  number  77191847,  Development  and  Validation  of  Civilian 
and  Non-rated  Officer  Selection  Methodologies.  This  work  unit  was  established  In 
response  to  Air  Force  Regulation  (AFR)  35-8,  Air  Force  Military  Personnel  Testing  System. 

express  my  thanks  to  the  personnel  of  the  Technical  Services  Divls 
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AIR  FORCE  OFFICER  QUALIFYING  TEST  (AFOOT) 
RETESTING  EFFECTS 


I.  INTRODUCTION 

The  purpose  of  this  research  and  developaient  (RIO)  effort  Mas  to  Investigate  the  effects  of 
retesting  on  the  Air  Force  Officer  Qualifying  Test  (AFOOT).  The  AFOOT  Is  a  paper-and-pencll 
aptitude  test  battery  used  to  make  selection  and  classification  decisions  for  Air  Force 
officers.  It  Mas  of  special  Interest  to  determine  the  effects  of  retaking  the  AFOOT  In  less  than 
6  aonths.  Air  Force  Regulation  (AFR)  3S-8,  Air  Force  Military  Personnel  Testing  System,  dated 
March  1978,  stated  that  an  Individual  say  not  retest  on  the  AFOOT  in  less  than  4  months. 
HoMever,  In  April  1983,  this  regulation  Mas  revised  to  Increase  the  retesting  restriction  to  6 
months.  A  retest  Is  noM  permitted  In  less  than  6  months  If  officially  requested  through  the 
Major  Command  Test  Control  Officer  (MAJCOM  TCO)  to  the  Air  Force  Military  Personnel  Center 
(AFMPC/MPCYPT)  and  approved.  Approval  of  this  Malver  depends  upon  Mhether  an  Individual  can 
provide  justification  suggesting  that  the  results  of  the  first  administration  of  the  AFOOT  did 
not  reflect  his/her  true  abilities.  Examples  of  a  valid  reason  for  a  Malver  Include  Illness  and 
a  recent  death  In  the  family.  Because  there  Mas  some  ambiguity  regarding  the  optimal  time 
Interval  between  the  Initial  test  and  the  retest,  this  Investigation  Mas  undertaken  to  determine 
the  effects  of  retesting  over  various  time  Intervals. 

The  consistency  of  aptitude  retest  scores  depends  on  (a)  the  extent  to  Mhlch  aptitude  changes 
and  (b)  test  reliability.  In  theory,  retesting  on  an  aptitude  test  should  result  in  no  changes 
In  scores  If  there  Mere  no  changes  in  the  underlying  aptitude  and  the  test  Mas  perfectly 
reliable.  However,  It  Is  naive  to  assume  that  environmental  Influences  do  not  affect  individuals 
between  tests  to  cause  changes  in  aptitude.  Humphreys  (1978)  even  suggested  that  neither 
aptitude  nor  achievement  should  be  used  as  labels  on  any  test  because  of  a  disparity  between 
theory  and  practice. 

In  fact,  there  Is  evidence  that  suggests  scores  on  aptitude  tests  such  as  the  AFOOT  will 
Improve  over  time  due  to  both  internal  and  external  factors.  A  study  was  conducted  by  Chrlstal 
(1984)  In  which  he  administered  the  AFOOT  to  members  of  an  Air  Force  Reserve  Officer  Training 
Corps  detachment  every  other  year  for  4  years.  It  was  shown  that  the  greatest  gains  were  In  the 
spatial  subtests.  These  Increases  were  followed  by  those  In  the  numeric  subtests  while  the 
verbal  subtests  showed  the  least  amount  of  gain. 

Other  studies  dealing  with  the  effects  of  retesting  on  aptitude  tests  have  compared  the 
scores  of  retesters  with  those  who  do  not  retest.  Givner,  Kllntberg,  and  Hynes  (1980)  examined 
the  effects  of  retesting  on  the  Medical  College  Adnission  Test  by  comparing  retesters  with 
non-retesters.  They  found  that  while  there  was  some  improvement  in  scores  for  retesters,  their 
Initial  and  retest  scores  were  significantly  less  than  those  of  examinees  who  did  not  retest. 
Similar  findings  were  reported  by  Alderman  (1981)  for  the  Scholastic  Aptitude  Test  (SAT). 

Another  problem  of  measuring  change  In  aptitude  tests  Is  the  reliability  of  the  testing 
Instrument.  According  to  Cronbach  and  Furby  (1970),  any  change  In  test  performance,  as  measured 
by  subtracting  a  pretest  score  from  a  posttest  score,  could  lead  to  fallacious  conclusions.  The 
reason  Is  that  this  change  Is  systematically  related  to  error  of  measurement.  That  Is, 
Individuals  who  score  low  Initially  would  tend  to  score  higher  on  any  subsequent  test  whereas 
high  scorers  would  tend  to  score  lower.  This  tendency  Is  called  regression  toward  the  mean. 
However,  problems  In  Interpreting  measures  of  change  may  be  avoided  by  taking  into  account 
standard  error  of  measurement.  The  phenomenon  of  regression  toward  the  mean  and  how  to  deal  with 
it  is  discussed  more  fully  by  Cohen  and  Cohen  (1975), 


This  Investigation  took  two  approaches  to  the  problem  of  retesting.  First,  the  retesters 
were  divided  Into  samples  that  had  6-month  Increments  between  tests.  In  this  manner,  maturation 
and/or  learning  effects  could  be  studied  along  with  the  effects  of  waiving  the  6-month 
restriction  prescribed  In  AFR  35-8.  Secondly,  retesters'  scores  were  compared  with  scores  of 
those  who  did  not  retest.  This  was  done  to  determine  whether  the  average  scores  of  retesters 
differed  from  the  average  scores  of  those  who  did  not  retest.  It  Is  likely  that  Individuals  who 
retested  represent  a  self-selected  group  and  therefore  can  be  expected  to  be  different  from 
non -retesters. 


II.  METHOD 

The  subjects  were  examinees  tested  on  AFOQT  Form  0  between  October  1981  and  December  1983. 
The  subjects  Included  examinees  who  retested  on  the  AFOQT  during  this  time  as  well  as  those  who 
tested  only  once.  Further,  only  Officer  Training  School  { OTS )  candidates  were  Included  as 
subjects.  About  91  of  these  subjects  were  females,  and  approximately  80S  had  at  least  a  college 
education  and  were  between  the  ages  of  21  and  27.  Since  the  purpose  of  this  RAD  was  to  analyze 
differences  among  Individuals  retested  below  6  months,  those  who  retested  above  the  6  month 
point,  and  non-retesters,  the  subjects  were  assigned  to  the  following  samples: 

Sample  Ri_g  -  Individuals  who  retested  less  than  6  months  after  first  test  (N  »  312). 

Sample  -  Individuals  who  retested  at  least  6  months  but  less  than  12  months  after 

first  test  (N  -  1,300). 

Sample  R^-n  “  Individuals  who  retested  at  least  12  months  but  less  than  18  months  after 
first  test  (N  -  443). 

Sample  R^g.27  -  Individuals  who  retested  18  months  or  more  after  first  test  (N  »  191). 
None  was  retested  more  than  27  months  after  first  test. 

Sample  Kg,??  -  Individuals  who  retested  6  months  or  more  after  first  test  {N  *  1934). 

Sample  NR  -  Individuals  who  did  not  retest  (N  ■  42,776). 

The  variables  used  In  the  analysis  were  Short  Battery  scores  on  five  composites  derived  from 
16  subtests  which  make  up  the  AFOQT.  Short  Battery  scores  are  a  subset  of  all  the  Items  In  the 
AFOQT  and  were  used  for  decision  making  prior  to  January  1984.  The  composites  are  Pilot, 
Navigator-Technical,  Academic  Aptitude,  Verbal,  and  Quantitative.  Two  sets  of  scores  were 
obtained  on  each  subject  except  for  the  non-retesters.  Composite  scores  were  percentiles  ranging 
from  1  through  99.  For  the  purposes  of  this  study,  these  variables  were  labeled  as  follows: 

Pj  -  Pilot  composite  score  on  first  testing. 

Pj>  -  Pilot  composite  score  on  second  testing. 

N-Ti  -  Navigator-Technical  composite  score  on  first  testing. 

N-T2  -  Navigator-Technical  composite  score  on  second  testing. 

AAi  -  Academic  Aptitude  composite  score  on  first  testing. 

AA2  -  Academic  Aptitude  composite  score  on  second  testing. 

V-)  -  Verbal  composite  score  on  first  testing. 

V2  -  Verbal  composite  score  on  second  testing. 

Q-)  -  Quantitative  composite  score  on  first  testing. 

Q2  -  Quantitative  composite  score  on  second  testing. 


Statistical  techniques  used  for  the  data  analysis  Included  Independent  and  dependent  t- tests 
for  differences  between  means.  T-tests  for  related  means  were  computed  to  detect  differences 
between  the  two  administrations  of  the  AFOOT.  Independent  t-tests  were  used  to  compare  the  means 
between  samples.  As  120  t-tests  were  computed,  the  accumulation  of  Type  I  error  In  performing 
multiple  t-tests  was  a  potential  problem.  This  was  avoided  by  adopting  a  stringent  level  of 
significance  (.001). 

Linear  models  analysis  was  also  used  to  predict  what  scores  the  retesters  would  have  received 
on  the  second  adalnl strati  on  If  their  Initial  scores  were  held  constant  between  samples.  Linear 
models  analysis  Is  a  technique  In  which  a  full  model  Is  compared  with  a  restricted  model  through 
the  use  of  F-tests.  If  no  significant  differences  are  found  between  the  full  and  the  restricted 
models,  the  restricted  model  can  predict  the  criterion  as  well  as  the  full  model  and  Is  therefore 
used.  If  significant  differences  are  found,  the  full  model  must  be  used  to  predict  the 
criterion.  A  complete  explanation  of  this  procedure  may  be  found  In  Hard  and  Jennings  (1973).  A 
diagram  showing  the  full  and  restricted  models,  as  well  as  how  the  determination  was  made  as  to 
which  models  to  use  In  this  research.  Is  shown  in  Appendix  A. 


III.  RESULTS 

Table  1  and  Figure  B-1  show  the  mean  AFOOT  composite  scores  for  each  sample.  Comparing 
sample  R7..5  and  sample  Rg.27,  percentile  scores  were  found  to  be  higher  when  Individuals 
retested  6  months  or  more  after  the  first  test,  as  opposed  to  less  than  6  months  after  the  first 
test.  A  second  general  trend  Is  seen  when  one  compares  mean  scores  among  retest  samples  that 
were  broken  Into  6-month  Increments.  Sample  R^.g  generally  had  the  lowest  mean  scores  of  all 
samples  on  both  adnlnl strati ons.  Means  for  sample  Rg_^  were  higher  than  those  of  sample 
Rl_5,  but  sample  R-j 2-l 7  means  were  generally  lower  than  those  of  sample  Rg_n.  The  highest 
mean  scores  were  found  In  sample  NR  while  sample  R-j 8-27  had  the  largest  means  among  the 
re testers. 

Table  1.  Mean  AFOQT  Composite  Scores 

Samples 


Composites 

(N  ■  42,776) 
NR 

(N  -  312) 

*1-5 

(N  -  1,300) 
*6-11 

(N  -  443) 
*12-17 

(N  -  191) 
*18-27 

(N  -  1,934) 
*6-27 

P1 

46.97 

30.04 

33.72 

33.54 

41.64 

34.46 

p2 

44.56 

46.70 

43.63 

51.96 

46.52 

N-Ti 

46.59 

27.66 

31.28 

30.32 

39.74 

31.90 

n-t2 

40.83 

43.28 

39.76 

49.19 

43.06 

AA1 

47.67 

25.28 

28.08 

28.14 

38.41 

29.11 

AAg 

35.98 

38.87 

37.09 

47.66 

39.33 

vi 

52.47 

33.21 

35.34 

36.26 

45.00 

36.51 

*2 

43.06 

44.47 

44.30 

52.79 

45.25 

Ol 

44.49 

24.26 

27.23 

26.25 

36.30 

27.90 

02 

34.06 

37.31 

34.10 

44.59 

37.29 

Independent  t-tests  were  computed  among  the  means  of  all  samples  shown  In  Table  1.  The 
results,  reported  as  level  of  significance  obtained,  are  presented  In  Table  2.  As  shown  In  the 
first  five  rows  of  the  table,  mean  scores  for  non-retesters  were  significantly  higher  than  those 
for  retesters  In  all  samples  on  both  the  Initial  and  second  administration  of  the  AFOOT.  Two 
exceptions  were  noted  In  the  second  administration  of  the  Pilot  composite  and  in  sample  Ri 8-27 - 
Only  In  sample  R12-17  was  P2  significantly  lower  than  the  non-retesters'  Pilot  score.  The 
other  exception  was  that  the  mean  scores  of  sample  Ri 8-27  on  a'^  composites  of  the  second 
administration  did  not  differ  from  those  obtained  by  sample  NR. 


Table  2.  Level  of  Significance  Between  Samples'  Mean  AFOOT  Scores 


AFOOT  Composites 


Sample  Comparisons 

r2 

N-Ti 

n-t2 

AAi 

aa2 

Vl 

*2 

Qi 

02 

NR  vs.  R^_5 

.088 

.001 

.001 

.001 

NR  vs.  R$_n 

.704 

.001 

.001 

NR  vs.  R-12-17 

ESI 

.001 

.001 

.001 

NR  vs.  Ri 8-27 

.012 

ESI 

.185 

.996 

.872 

.959 

NR  vs.  R5.27 

.449 

.001 

.001 

.001 

R1  -5  vs-  r6-11 

.175 

.119 

.024 

.050 

.165 

.394 

.024 

.031 

r1-5  vs*  r1  2-1 7 

.029 

.627 

.093 

.569 

.053 

.526 

.090 

.523 

.196 

.982 

R1  -5  vs-  R1 8—27 

.001 

.001 

R1  -5  vs-  **6-27 

.208 

.150 

.022 

.029 

.173 

.029 

r6-1  1  vs-  r12-17 

.881 

.028 

.420 

.011 

.955 

.169 

.492 

.905 

.395 

.015 

Rfi-H  vs.  Ris-27 

.001 

.001 

r12-17  «•  r18-27 

.001 

.001 

.001 

Next,  comparisons 

focused  on  examinees 

who  retested  In 

less 

than  6 

months 

(sample  Ri 

-5>- 

As  shown  In  the  middle  part 

of  Table 

2,  two  trends 

emerged 

from 

comparisons  among  sample 

R1  -5 

and  the  other  three 

retest 

samples. 

Generally, 

scores  for  sample  Ri 

_5  were 

lower  on 

the 

Initial  administration  than  those  of  other  retesters.  However,  on  the  second  administration, 
mean  scores  for  examinees  who  retested  In  less  than  6  months  were  significantly  lower  than  those 
of  only  examinees  who  retested  In  greater  than  18  months. 

The  final  set  of  comparisons  reported  In  Table  2  were  made  among  the  three  retest  samples  to 
whom  the  AFOQT  was  readministered  at  the  6-month  point  or  later.  Only  the  comparison  between 
samples  Rg_n  and  &12-17  showed  no  significant  differences  In  the  initial  administration  of 
the  AFOQT  on  all  five  composites.  However,  both  samples  scored  significantly  lower  on  most  AFOQT 
composites  on  the  Initial  and  second  administration  than  did  sample  T>i8-27' 

Table  3  shows  the  mean  Increases  in  composite  scores  for  all  samples.  In  every  case,  retest 
means  were  considerably  higher  than  original  test  means.  The  largest  increases  generally 
occurred  In  sample  R-j .5.  Across  all  samples,  the  mean  Increase  was  the  greatest  for  the  Pilot 
and  Navigator-Technical  composites.  The  test-retest  correlations  for  all  retesters  on  each 
composite  were  as  follows:  Pilot  ■  .812,  Navigator-Technical  »  .852,  Academic  Aptitude  *  .853, 
Verbal  *  .880,  and  Quantitative  *  .  775.  These  reliabilities  were  lower  than  expected,  most 
likely  due  to  the  restricted  variability  In  the  sample. 


Table  3.  Mean  Percentile  Increase  of  AFOOT  Scores 
Between  First  and  Second  Administrations 


Sample 

N 

Pilot 

Nav-Tech 

Academic  Apt 

Verbal 

Quantitative 

R1  -5 

312 

14.52 

13.17 

10.71 

r6-1  1 

1300 

12.98 

10.79 

R1 2-1 7 

443 

10.09 

9.44 

8.95 

8.03 

7.85 

R1 8-27 

191 

10.32 

9.45 

9.25 

7.79 

8.29 

R6-27 

1934 

12.05 

11.16 

10.22 

8.74 

9.40 

Note.  Some  or  these  figures  vary  slightly  from  TaDle  i  cue  to  rounding, 


but  all  increases  were  significant  at  the  .001  level. 
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Since  the  majority  of  samples  showed  significant  differences  among  their  Initial  scores.  It 
became  necessary  to  compute  regression  analyses  to  determine  what  their  second  scores  would  have 
been  If  their  first  scores  had  been  the  same.  By  doing  this,  the  results  could  be  analyzed  as 
though  each  sample  were  drawn  from  the  same  population. 

These  analyses  revealed  curvilinear  relationships  on  all  five  composites  between  the  Initial 
score  and  the  second  predicted  score  (see  Appendix  B).  Group  effects  were  found  In  the  Pilot, 
Navigator-Technical,  and  Academic  Aptitude  composites  but  not  In  the  Verbal  and  Quantitative 
composites.  An  Interaction  effect  was  apparent  only  with  Academic  Aptitude.  The  regression 
models  found  to  be  significant  for  the  five  composites  were  as  follows:  Model  3  for  Pilot  and 
Navigator-Technical,  Model  1  for  Academic  Aptitude,  and  Model  2  for  Verbal  and  Quantitative  (see 
Appendix  A). 

The  Increase  of  scores  at  the  25th,  50th,  and  75th  percentiles,  as  predicted  from  the 
regression  analysis.  Is  shown  In  Table  4.  It  should  be  noted  that  a  noticeable  difference  in 
predicted  Increases  exists  for  those  samples  who  retested  In  less  than  12  months  versus  those  who 
retested  12  months  or  more  after  the  first  administration.  The  largest  predicted  Increases  of 
the  Pilot,  Navigator-Technical,  and  Academic  Aptitude  composites  occurred  for  samples  ^.5  and 
*6-11*  Another  finding  was  that  the  largest  predicted  Increases  were  observed  below  the  50th 
percentile  across  all  composites.  This  was  entirely  expected. 

Table  4.  Predicted  Percentile  Score  Increases 
~  on  AFOQT  Composites  by  Retest  Interval 


AFOOT 

Percentile  R-|_5 


Retester  Samples 


*12-17 


*18-27 


-  Technical 


IV.  DISCUSSION 


All  retesting  produced  significant  Increases  In  subjects'  AFOOT  scores.  It  then  became 
Interesting  to  speculate  on  what  Influences  caused  the  Increase  In  scores.  Possible  causes 
Include  regression  to  the  mean,  maturation,  learning  (l.e.,  practice  effects  and  coaching),  or 
different  motivations  for  retesting.  Each  of  these  will  be  discussed  In  turn. 

Regression  to  the  mean  cannot  explain  the  magnitude  of  score  Increase  observed  In  the 
retesters'  scores.  The  AFOOT  Is  a  highly  reliable  test  Instrument  with  reliabilities  ranging 

from  .689  to  .922  across  the  16  subtests.  Although  the  standard  error  of  measurement  (5.92) 

explains  most  of  the  change  In  the  smallest  mean  percentile  score  Increase  (7.79),  there  are 
still  other  factors  which  may  account  for  the  differences. 

It  Is  highly  unlikely  that  maturation  could  have  caused  the  score  Increases  in  all  samples. 
If  that  were  the  case,  the  score  Increases  would  have  been  greater  as  the  time  Interval  between 
tests  Increased.  Additionally,  It  Is  doubtful  that  maturation  could  have  occurred  In  less  than  6 
months.  Maturation  may  have  been  a  factor  with  the  score  Increases  In  sample  Ris-27»  h°*ever. 
This  was  the  only  sample  whose  second  AFOOT  scores  equaled  the  non-retesters1  scores. 

When  examining  the  effects  of  learning,  two  areas  need  to  be  considered.  One  Is  the  practice 

effect  of  having  recently  taken  the  test,  and  the  other  Is  the  effect  of  possible  coaching 
between  tests.  In  a  study  by  Johnson,  Fllnn,  and  Tyer  (1979),  It  was  shown  that  spatial  skills 
significantly  Improve  with  practice.  The  data  In  the  present  study  showed  the  greatest  gains  in 
the  Pilot  and  Navigator-Technical  composites,  which  are  largely  composed  of  spatial  tests. 
Furthermore,  this  effect  was  most  pronounced  with  those  subjects  who  retested  In  less  than  12 
months.  Verbal  and  Quantitative  scores  were  not  as  susceptible  to  change.  A  less  likely  cause 
of  the  Increase  In  scores  would  be  coaching.  DerSImonlan  and  Laird  (1983)  reported  small  but 
positive  effects  of  coaching  on  SAT  scores.  That  is,  they  changed  true  scores  by  teaching 
subject  matter,  not  testwiseness  tricks.  Since  the  subjects  In  this  study  were  as  likely  to  be 
coached  before  the  Initial  administration  of  the  AFOOT  as  between  administrations,  this  probably 
was  not  the  cause  of  the  increase. 

Motivation  for  retaking  the  AFOOT  may  have  caused  the  Increase  in  scores.  Because  the 
retesters  had  relatively  low  scores,  their  motivation  surely  was  to  Increase  their  scores.  The 
motivation  to  retake  the  AFOOT  in  sample  Ri 8-27  have  differed  from  the  other  retester 
samples.  According  to  AFR  35-8,  Individuals  are  required  to  retest  If  their  scores  are  more  than 
2  years  old  and  they  are  applying  for  commissioning  or  flying  training.  Therefore,  sample 
R1 8-27  score  Increases  may  have  been  caused  either  because  of  their  differing  motivation  or 
because  they  matured  between  tests. 

The  regression  analysis  accomplished  to  equate  the  initial  scores  on  all  samples  supported 
the  contention  that  learning  did  occur  in  the  Pilot  and  Navigator-Technical  composites.  Some 
slight  maturation  effects  were  also  shown  In  these  composites  in  that  sample  8-27  P°sted 
higher  retest  scores  than  did  sample  R-j 2-1 7 *  No  group  differences  were  found  in  the  Verbal  and 
Quantitative  composites,  which  leads  to  the  conclusion  that  these  are  the  most  stable  of  the 
aptitude  Indicators. 

When  the  results  of  the  regression  analysis  were  compared  with  the  obtained  mean  Increase  of 
scores  in  Table  3,  two  observations  were  made.  First,  the  predicted  Increases  corresponded  with 
the  obtained  Increases  in  the  Pilot  and  Navigator-Technical  composites.  That  Is,  the  largest 
Increases  were  found  In  sample  R-j  .5 ,  followed  by  samples  1*6-11  >  R1 8-27  •  and  R12-17’ 
Furthermore,  there  was  a  noticeable  difference  in  score  Increases  in  these  composites  between 
those  who  retested  above  and  below  12  months.  Second,  the  predicted  increase  in  scores  was  a 


function  of  not  only  tine  between  test  adnlnlstratlons  but  also  Initial  score  level.  That  Is, 
greater  score  Increases  nay  be  expected  at  the  25th  percentile  than  the  75th  percentile.  This 
was  particularly  relevant  because  the  mean  Initial  scores  for  all  retesters  across  all  of  the 
coaposltes  were  below  the  50th  percentile. 

When  mean  scores  of  sample  NR  were  compared  with  nean  scores  of  all  of  the  re tester  samples, 
the  results  showed  that  sanple  NR  generally  scored  significantly  higher  than  the  retesters  on 
both  administrations.  This  Indicates  that  personnel  who  decided  to  retest  probably  did  so  to 
Increase  their  scores.  Despite  the  Increase  of  their  scores,  they  would  continue  to  be 
discernible  from  non-retesters,  who  posted  higher  scores. 

The  only  exceptions  to  this  finding  would  be  in  explaining  the  data  from  sample  Rib-27  an<1 
the  second  Pilot  adnlnl strati on.  Sanple  Ri 8-27  ma>  have  bad  a  different  motive  In  retaking  the 
AFOOT  In  that  they  probably  did  so  to  keep  their  scores  current.  Pg  scores  were  not  different 
from  sanple  NR's  Pilot  scores,  which  seens  to  Indicate  that  some  learning  occurred  In  the  Pilot 
composite  subtests. 


V.  CONCLUSIONS 

Two  questions  were  addressed  In  this  study.  One  concerned  the  effects  of  retesting  over 
various  time  Intervals.  The  other  was  whether  retesters  were  similar  to  non-retesters.  The 
following  findings  were  obtained. 

First,  regardless  of  the  time  Interval  between  administrations  of  the  AFOOT,  Increases 
occurred.  In  the  case  of  waiving  the  6-month  retesting  restriction,  whether  the  Increase  was  due 
to  learning  or  being  valid  cases  for  the  waiver  Is  debatable.  The  regression  analysis  showed 
that  the  highest  gains  were  found  for  samples  R-j _5  and  Rg.n.  Moderate  Increases  were  also 
found  for  sample  Ri8-27>  which  Indicated  that  some  slight  maturation  effects  possibly 

occurred.  However,  If  a  goal  of  retesting  Is  to  minimize  the  effects  of  practice,  then  the 

minimum  time  to  allow  retesting  Is  12  months  after  the  first  administration.  This  Is  especially 
critical  for  those  Individuals  applying  for  pilot  or  navigator  training. 

Also,  the  linear  models  analysis  revealed  that  the  amount  of  gain  depends  not  only  on  time 

between  tests  but  also  on  Initial  score.  Since  most  retesters  score  low  Initially,  large 

Improvements  In  AFOOT  scores  may  be  expected.  However,  only  In  marginal  cases  would  retaking  the 
AFOOT  substantially  Improve  an  Individual's  chances  of  being  selected  Into  OTS  given  the 
competitive  nature  of  today's  recruiting  environment. 

Finally,  Individuals  who  retested  were  a  highly  self-selected  group.  Although  retesters' 
scores  were  Improved  by  taking  the  AFOOT  again,  they  still  did  not  equal  non-retesters ’  scores, 
and  retesters  were  discernible  from  non-retesters.  Therefore,  even  though  large  Increases  In 
scores  may  be  expected  with  retesting,  those  Increases  would  not  be  sufficient  In  most  cases  to 
change  an  applicant's  chances  of  being  selected  Into  OTS.  However,  given  the  large  Increases  In 
Pilot  scores,  especially  of  those  who  retested  In  less  than  12  months,  pilot  classification 
decisions  may  be  altered  by  retesting. 

Future  research  Is  Indicated  from  these  results.  In  a  follow-up  study,  subjects  shoula  be 
randomly  assigned  to  four  groups  after  Initial  administration  of  the  AFOOT.  One  group  would 
retest  shortly  after  the  first  test.  The  other  groups  would  retest  6,  12,  and  18  months  later. 
This  paradigm  would  control  different  motivation  factors  for  retesting  (l.e..  Illness  on  the 
first  test,  keeping  test  scores  current)  while  measuring  learning  and  maturation  effects. 
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*  sig  =  significant;  ns  =  not  significant 


Figure  A-l .  Sequential  F-test  Comparisons 
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Initial  versus  Retest  Score  Comparisons. 


Pilot  Coaposlte  Regression  Lines 
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SAMPLE 

SAMPLE 

SAMPLE 


INITIAL  SCORE 

Navigator-Technical  Composite  Regression 


SAMPLES 


Verbal  Composite  Regression  Line 


